DOCUMENT RESUME 



ED 310 175 TM 014 014 

AUTHOR Byrne, Barbara M. 

TITLE Testing for Factorially Invariant Measuring 

Instruments: A Reexamination amd Application. 
PUB DATE Aug 88 

NOTE 50p.; Paper presented at the Annual Meeting of the 

American Psychological Association (Atlanta, GA, 
August 12-16, 1988). 

PUB TYPE Speeches/Conference Papers (150) ~ Reports - 

Evaluative/Feasibility (142) 

EDRS PRICE MF01/PC02 Plus Postage. 

DESCRIPTORS Academically Gifted; Analysis of Covariance; Analysis 
of Variance; Elementary Education; *Factor Analysis; 
Grade 5; Grade 8; Latent Trait Theory; Mathematical 
Models; *Measurement Techniques; Research 
Methodology; *Research Problems; Statistical 
Analysis; Test Reliability; *Test Validity 

IDENTIFIERS Confirmatory Factor Analysis; «Invariance; *LISREL 
Computer Program; Perceived Competence Scale For 
Children 

ABSTRACT 

The paper identifies and addresses four 
methodological wecUcnesses common to most previous studies that have 
used LISREL confirmatory factor analysis to test for the factorial 
validity and invariance of a single measuring instrument. 
Specifically, the paper demonstrates the steps involved in: (1) 
conducting sensitivity analyses to determine a statistically 
best-fitting, yet substantively most meaningful baseline model; (2) 
testing for partial measurement invariance; (3) testing for the 
invariance of factor variances and covariamces, given partial 
measurement invariance; and (4) testing for the invariance of test 
item amd subscale reliabilities. These procedures are illustrated 
with item response data from the Perceived Competence Scale for 
Children from 129 normal and 132 gifted students in grade 5 and 113 
normal and 117 gifted students in grade 8 from two public school 
systems in Ottawa (Ontario). Seven tables present study data. 
(Author/SLD) 



**************************************** A* 

* Reproductions supplied by EDRS are the best that can be made 

* from the original document. 



Testing 



for 



Factorially Invariant McasurinR Tnicruments: 
A Reexamination and Application 



Barbara M. Byrne 
University of Ottawa 



Paper presented at the American Psychological Association 
Annual Meeting, Atlanta, 1988 



U.S. DEPARTMENT OF EDUCATION 
Office of Educational Research anH Improvement 

EDUCATIONAL RESOURCES INFORMATION 
CENTER (ERIC) 



,^hi8 document has been reproduced as 
N'eceived from the person or organization 
originating it 

□ Minor changes have been made to improve 
reproduction quality. 

e Points of view or opinions stated in this docu- 
ment do not necessarily represent official 
OERI position or policy. 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCAliONAL RESOURCES 
INFORMATION CENTER (ERICV 



2 



BEST COPY AVAILABLE 



Factorial Val idity 
2 



Abstract 

The paper identifies and addresses four methodological 
weaknesses common to most previous studios that have used 
LISRKI, confirmatory factor analysis to test for the factorial 
validity and invariance of a single measurinj> instrument. 
Specifically, the paper demonstrates the steps involved in (a) 
conducting sensitivity analyses to determine a statistically 
best-fitting, yet substantively most meaningful baseline model, 
(b) testing for partial measurement invariance, (c) testing for 
the invariance of factor variances and covariances, given 
partial measurement invariance, and (d) testing for the 
invariance of test item and subscale reliabilities. These 
procedures are illustrated with item response data from nornral 
and gifted children in grades 5 and 8, based on the Peiceived 
Competence Scale for Children. 



ERLC 



Fnr t or la 1 Validity 
3 



Testing tho Fnctorial Validity and Tnvnrianco of a MoasiirinR 
Instrument Using LISRKL Confirmatory Factor Analyses: 
A Reexamination and Application 

In substantive research, an important assumption in 
single-group analyses is that the assessment instrument is 
measuring that which it was design(*d to measure (i.e., it is 
factorially valid), and in multigroup analyses, that it is 
doing so in exactly the same way across independent samples 
(i.e., it is factorially invariant). Traditionally, the factor 
structure of a measuring instrument has been validated by means 
of exploratory factor analysis (EFA), and its invariance tested 
by the comparison of EFA factors across groups using diverse ad 
hoc procedures (for a review, see Marsh & llocevar, 1985; 
Reynolds ft Harding, 1983). At this point in time, however, the 
limitations of RFA are widely known (see e.g., Fornell, 1983; 
Long, 1983; Marsh & llocevar, 1985), as are the issues related 
to tests of factorial invariance based on KFA factors (see 
Alwin & Jackson, 1981 ) . 

A methodologically more sophisticated and statistically 
more powerful technique for such analyses is the confirmatory 
factor analytic (CFA) procedure proposed by Joreskog (1960), 
and now commercially available through the LISREL VI computer 
program (Joreskog ft Sorbom, 1985). The LISRFL CFA approach 



4 



Fnrtorial Validity 
A 

allows researchers to test a series of hypotheses related to 
(a) the factorial validity of an assessment instrument, and (b) 
the equivalency of its factorial structure and measurements 
across groups. While a number of construct validity studies 
have applied the technique to mul titrait-mul timethod analyses 
of assessment measures (e.g., Bachman & Palmer, 1981; Flamer, 
1983; Forsythe, McGaghie. & Ftiedman, 1986; Marsh & Hocevar, 
198A; Watkins A Hattic, 1981), few have used it to evaluate the 
factorial validity or factorial invariancc of a single 
measuring instrument; of these, most have been incomplete in 
terms of model f i tt ing^ procedures and tests of invariance. The 
purpose of the present paper, in broad terms, is to address 
these limitations in a demonstration of LISREL CFA procedures 
for testing the factorial validity and invariance of a single 
measuring instrument. 

LISREL Confirmatory Factor Analysis 
Factor analysis, in general terms, is a statistical 
procedure for determining whether covariation among a set of 
observed variables can be explained by a smaller number of 
latent variables (i.e., factors). In contrast to EFA, where the 
only hypothesis tested concerns the number of factors 
underlying the observed data (Bentler, 1978), CFA permits the 
testing of several hypotheses; the number and degree of 
specificity being determined by the investigator. As such, 

i 

ERIC 



Fnrtorial Validity 
5 

bn5)ed on his/her knowlodfte of theoretical and empirical 
research, the investigator postulates a priori, a particular 
factor analytic model and then tests the model to determine 
whether or not it is consistent with the observed data; 
minimally, model specifications would include the number of 
latent factors, the pattern of factor loadings, and relations 
among the latent factors. 

The lilSRRL CFA framework incorporates two conceptually 

distinct models a measurement model and a structural model. 

The first of these specifies how the observed (i.e., measured) 
variables relet** to the underlying latent (i.e., unobserved, 
unmeasured) factors; the second specifies relations among the 
latent factors themselves. In LISKF.I notation, this means that, 
typically, the factor loading (lambda. A), error (theta,0) 3nd 
latent factor var iance-covariance (phi,* ) matrices are of 
primary importance. More specifically, A is a natrix of 
coefficients regressed from latent factors to observed 
variables, and ^ is the var i ance-covar i ance matrix of 
error /uniruenesses . These matrices make up the measurement 
aspect of the model.^ * is the factor variance-covariance matrix 

2 

and constitutes the structural part, of the model. Since a 
number of papers are available to readers that (a) specify the 
statistical theory underlying LISRKL CFA (e.g., Joreskog, 1969; 
Long, 1983), (b) outline basic notation and steps in using the 



Factorial Validity 
6 

LISRRL proRram (e.g., Lomax, 1982; Long, 1983; Wolfle. 1981), 
and (c) summarize advantages of LISRRI.. CFA over traditional RFA 
procedures (e.g.. Long, 1983; Marsh & Hocevar, 1985), these 
details are not provided here. 

The, process of validating the factorial structure of a 
measuring instrument and then testing for its invariance across 
groups involves two separate analytical procedures; the first 
is a prerequisite for the second. The initial step entails the 
estimation of a baseline model; since this procedure involves 
no bctween-group constraints, the data are analyzed separately 
for each group. The baseline model represents the most 
parsimonious, yet substantively meaningful ^nd best-fitting 
model to the data. Since instruments are often group-specific 
in the way they operate, these models are not expected to be 
identical across groups. For example, whereas the baseline 
model for one group might include correlated measurement errors 
and/or secondary factor loadings, this may not be so for the 

3 

second group. A priori knowledge of such group differences, as 
will be illustrated later, is critical in testing for 
equivalencies across groups. 

Having determined the baseline model for each group, the 
investigator may then proceed to tests of factorial invariance. 
Since these analyses involve the imposition of constraints on 
particular parameters, the data from all groups must be 

ERIC ' 



Tnc tor i n 1 



Vn 1 i (I i ty 
7 



annlv/c<l si mn 1 r onooiis 1 y to obtain officiont oslimatos (JoroskoR 
ft Sorbom, 1985). It is important to note, howcvor, that the 
pattern of fixrd and frco paranor<»rs romains consistent with 
the bnsol ine model spec i f i rat i on for each group. (For a review 
of US^JKI. CFA invaririnrc? testing appMcations, see liyrne, 
Shavelson ft ^liithen . in press; for details of the procedurn in 
Ronoral, see Alwin Jackson, lOMl; Hyrne et al., in press; 
JoreskoR, 1971a; Marsh K Hocevar, 19H5; Rock, Worts R Tlaiipjier, 
1978. 

A review of previous st-idios usinp, CPA hlSPRT procedures to 
validate assessment measures reveals several lijnitations . 
First, with three exceptions (Byrne, in press; Marsh, 1987b; 
Tanaka & Huba, 1984), researchers have not considered alternate 
model specifications beyond the one initially hypothesized (sec 
Benson. 1987; Marsh, 1985, 1987a; Marsh «^ flocevar, 1985; Marsh 
R O'Neill, 198/»; Marsh, Sniilh Barnes, 1985). Tn other words, 
researchers have (a) postulated a model, (b) tested its fit to 
the observed data, (c) arf>ued for the adequacy of mcdel fit, 
and (d) evaluated factorial validity on the basis of this a 
priori model. Such validity claims, however, may be considered 
dubious for at least two roasons: (a) in many cases, model fit 
was only mar(»inally j^ood , an<l (b) these models did not allow 
for sample-specific artifacts such as nonrandom measurement 
error (i.e., correlated error) and/or secondary factor 



ERIC 



s 



Fac torinl Vnl id ity 

H 

lon(lin(>s, tv/o * f i nd i n^s not uncommon to measures of psycho- 
logical construct?; (see e.«., Byrne, in press; Ryrne 
Shavolson. 1986; Mulia, WinRard, ?. Bentler, 1081; Mewcomb, Muba, 
?i nontler, 198G; Tanaica f- Muba, 198A). More appropriately, 
nioclol fitting should continue beyond the initially hypothesized 
nodel until a statist ical ly, best-fittinn nodol is determined; 
additional analyses can then be conducted to establish which 
parameters are statistically, as well as substantively 
important to the CFA model. In so doinp, , both practical and 
statistical significance are ta'cen into account (Huth^n, 
personal communication, January, 1987; see also, lluba et al., 

5 

1981; Tanaka i|uba, lO^A). 

While some have criticized such post hoc niodel-f i 1 1 inp 
practices (e.n., Browne, 1982; Fornell, 1083; rjacCaUum, 1987), 
Tanaka and lluba (1984) have argued that the process can be 
substantively noaninrjnl. For example, if the estimates of 
major parameters undergo no appreciable chanp,e when minor 
parameters are added to the model, this i an indication that 
the initially hypothesized nodel is empirically robust; the 
morn fitted model therefore represents a minor improvement to 
nn already adequate model and the additional parameters should 
be deleted from the model • If, on the other hand, the major 
parameters undergo substantial alteration, the exclusion of the 
post hoc parameters may lead to biased estioiates (Alwin u 



Fnrt.orinl Vnl idi ty 

<> 

Jnckson, lORO; Jorosko^,, 10M3); t'lo minor pnrnnotors should 

therefore ho rotninoci in the model. 

One method of estiTinLinn the prncticnl s i p,n i f icanc e of post 

hoc parameters is to correlate major parameters (the A 's and 
♦ 's) in the initially hy potlies izrd model with those in the 

hest-f itt inj» post hoc nodel (c.f. flarsh, 1987b). Coefficients 

close to 1.00 arf>ue for the st.ibility of the initial model and 

thus, the triviality of the minor parameters in the post hoc 
model. Tn contrast, coefficients that are not close to 1.00 

(say, <.90) are an indication that the major parameters were 
adversely affected, and thus ar;»nes for the inclusion of the 

post hoc parameters in the final hasel ino model. 

A second limitation of previous research relates to tests 
of factorial invariance. In particular, researchers have 
conducted such tests at the matrix level only; when confronted 
with a noninvariant At or $ , they have not continued testing to 
to determine the aberrant paranetet(s) that contributed to the 
noni nvariance (see Benson, 10^?7; finrsh, lOH^i, 1987b; Marsh ft 
Ilocovar, 19RSt Marsh et al., 108')). Consequently, readers are 
left wit!: the impression thnt j^iven a noninvariant pattern of 
factor loadiiif^s, further testinj; of invariance is unwarranted. 
This conclusion, however, is unfounded when the model 
specification includes multiple indicators of a construct 




Factorial Validity 

10 

(ihitlicn Christof f orsson , 1081). (For an extended discussion^ 
review of the .1 i tera tiirrs and application, see Pyrne et al., in 
press; for an -^pp 1 icatiow involvinf? dichotomous variables, see 
Miithcn & Christof fersson, lORl), 

Tn e'xanininp, factorial validity, partial measurement 
invariance is inportant because it beais directly on further 
testing of measurement and/or structural equivalencies. For 
example, the researclier may wish to test wh aher the 
theoretical structure of the underlying* contruct is equivalent 
across groups; the invariance of factor covariances, then, is 
of primary interest (see e.g., Flarsh, 1985; Harsh fi llocevar, 
1985). Alternatively, the investigator may be interested in 
testing for the invariance of item or subscale rel iabi 1 i tes ; in 
this case, the invariance of factor variances is of interest 
(see Cole & Maxwell, 1985; Rock et al., 1978). In testing for 
the invariances of factor variances and covariances, equality 
constraints are imposed on only those factor loadings known to 
be invariant across gro«i>s; this may include all, or only a 
portion of the factor loading parameters. 

A final limitation concerns studies that have investigated 
the invariance of item (Benson, 1987; Marsh, 1985, 1987b; Marsh 
& llocevar, 1985; Marsh et al., 1985) or subscale (Byrne ft 
Shavelson, 1987) reliabilities across groups. Three additional 
studies (Corcornn, 1080; Hare « Mason, 1980; Wolfle ft 



O 11 

ERLC 



Fnctorinl Vnlidity 
11 

Robortshnw, 19^3) nro roportcd hero for sake of completeness; 
the focus hero, however, wns on the equivalence of response 
error, rather than on specific test item or subscale 
relinbi I i Lies . Kach of these studios tested fo- the invariance 
of meas^urement reliablities by placing, constraints on both tlic X 
and the 0 paraneters. However, this procedure is valid only 
when the factor variances are Itnown to be equivalent across 
Rroiips (Cole ^ Maxwell, 19«5; Hork et al., 1978). When 
variances are nonin var iant , it is necessary to check the ratio 
of true and error variances in testino for the equivalence of 
reliabilities (see Worts, Rock, l*inn, u Joreskog, 1976). 

In sum, four meLho<loloRical weaknesses are evident with 
previous LISURh CFA validity studies of measuring instruments. 
First, model-fitting procedures have been incomplete in the 
determination of adequately specified baseline models. Second, 
testing for partial measurement invariance has not been 
considered. Third, jjiven the failure to test for, and identify 
partially invariant it:?n scaling units, researchers have not 
been able to proceed with testinj; for the invariance of 
structural parameters. Finally, tests for the invariance of 
item (or subscale) reliabilities liave assumed, rather than 
tested for, the equivalency of factor variances. As such, 
testinp, for thn invariance of reliabilities has been 
Incomplete, and in many cases, incorrectly executed. The 




Factorial Validity 
12 

purpose of this paper is to address these limitations by 
demonstrating the steps involved in: (a) conducttnR a 
sensitivity analysis to determine a ^seline model that is 
statistically best-fitting, yet substantively most meaninsf ul , 
(b) testing for, and testing with partial measurement 
invnriance, and (c) testing for the Invarianco of subscalo and 
item reliabilitips. 

Application of USREL Confirmatory Factor Analyses 

The Measuring Instrument 

The Perceived Competence Scale for Children (Harter, 1982) 
is used here for demonstration purposes. This 28-item 
self-report instrument measures four facets of perceived 
competence: co};nitive competence (i.e., academic abilit-Oi 
physical competence (i.e., athletic ability), social competence 
(i.e., social acceptance by peers), and general self-worth 
(i.e., global self-esteem). Each 7-item subscale has a 4-point 
"structured alternative" question format ranging from not very 
competent (1), to very competent (A), (For a summary of 
psychometric properties, see Byrne Schneider, 1988; Harter, 
1982). 

Data Base 

Data for the present demonstration came from a larger study 
that examined social relation differences tietween gifted 



KnrfDr inl Vnl idily 

n 

1 

students niul Ihoir non-<?iftO(l poors (soc Sclinoidor, Clof>f», 
Byrno, Ledin«hnm, * Cronhio, in press). Following listwiso 
deletion of missing data, the namplo for the present paper 
comprised 2A1 (>rado 5 (129 nornni, 132 p.ifted) and 230 srade « 
(113 normal, 117 niftoci) children from the two public school 
systems in Ottawa, Can.ida. Overall, an examination of item 
skewness and kiirtosis revealed a distribution that was 
approximately normal for oarli ^roiip (see Muthen ft Kaplan, 
1985). (For details concerning, descriptive statistics, 
selection criteria and sampling procodnros, see Ryrno 
,Schneider, lOSH). 
Analysis of the Data 

Analyses am conducted in two major staj^es. First, the 
factorial validity of the PdSC is tested separately for trades 
*> iind 8 in the normal and pjftod samples, and a baseline model 
established for each of the four f>roiips. Second, tests for t!io 
factorial invarianre of item rpS!)onses across j^rade are 
cond..ctcd separately for the nornal and gifted samples. 

Analyses are based on an item-pair structure (with the 
exception of one item in each sul)scn1e). As such, the seven 
items in each subsrale are paired off, with items 1 and 2 
forming the first couplet, items 3 and A the second couplet, 
and items 5 and 6 the third couplet; item remains a 
sinpleton. The decision to use itrm-nairs was based on two 



Pnc t or i n 1 



Vnl icii ty 
U 



prinnry factors-: 'n) the low rntio of nninher of subjects per 
test item for each suhsample, and (b) preliminary RFA results 
derived from single-item analyses indicating, for the most 
part, that items wore reasonably homogeneous in their 
domain-specific measurements of perceived competence (see Byrne 
» Schneider, 1988). Furthermore, Marsh, Barnes, Cairns, & 
Tidman (1984) have argued that the analysis of item-pairs is 
preferable to single items for at least four additional reasons 

item-pair variables are likely to: (a) be more reliable, 

(b) contain loss unique variance since they are less affected 
by the idiosyncratic wordino of individual items, (c) be more 
normally distributed, and (d) yield results having a higher 
degree of goner a 1 i zab i 1 i t y . 

The CFA model in the present study hypothesizes a priori 
that: (a) responses to the PCSC can bo explained by four 
factors, (b) each item-pair (and item singleton) has a non-zero 
loading on the perceivecl competence factor that it is designed 
to measure (i.e., target loading), and zero loadings on all 
other factors (i.e., non-tnrget loadings), (c) the four factors 
are correlated, and (d) error/uniqueness terms for the 
item-pair (and item singleton) variables are uncorrelated . 
Parameter specifications are sumnarized in Table 1. 



ERIC 



15 



Kncrorinl Val id i ty 
15 



Insert Table 1 nboiit here 



Covnrifincc structyro analysis has traditionally relied on 

2 

the X 'likelihood ratio test ns a criterion for assessing the 
extent to which a proposed model fits the observed data; a 
nonsignificant indicates a well-fitting model. However, the 
sensitivity of the statistic to sample size, as well as to 
various model assumptions (i.e., linearity, mu 1 t inorma 1 i t y , 
additivity) are now well known (see e.g., Rentier & Bonett, 
1980; Fornell, 1983; lluba X Harlow, 1987; Joreskog, 1982; Ma-sh 
& Ilocevar, 1985; Muth^n /I Kaplan, 1985; Tanaka, 1987). As an 
alternative to X t other good^ess-of -f i t indices have been 
proposed (see e.g.. Rentier ^ Bonett, 1980; lloelter, 1983; 
Tanaka & Huba, 1985; Tucker /I Lewis, 1973). Researchers, 
however, have been urged not to judge model fit solely on the 
basis of values (Rentier Ronctt, 1980; Joreskog St Sorbom, 

1985), or on alternative fit indices (Sobol S Bohrnscedt, 
1985); rather, assessments should bo based on multiple 
criteria, including "substantive, theoretical and conceptual 
considerations" (Joreskog, 1971, p. 421; see also, Sobel & 
Bohrnstedt , 1985) . 

^Assessment of model fit in the present example is based on 
(a) the likelihood ratio test, (b) the X^/df ratio, (c) 



•ERIC 



16 



Kaclorinl Vnliclity 



T-valiiof?» normalized residuals and modification indices 
provided by LTSUl.L VT,^and (d) knowledf^e of substantive and 
theoretical research in this area. 
Fittin^i the HaseHne Fiodcl 

Since parameter specifications for the hypothesized 
A-factor model do not include equality constraints between 
various subsanples, all analyses are performed on the observed 
correlation matrix for each ^roup. Results of the model-fitting 
process are reported in Tables 2 and 3 for the normal and 
gifi^.^d samples, respectively. 

Normal sample * As shown in Table 2, the initial model 
(Model 1) represented a fairly reasonable fit to the observed 
data for grade 5 students ( X^/df « 1.55). Nonetheless, an 
examinaton of the modification indices revealed three 
ofr-diap,onal values in the 6 matrix that were p,reater than 5.00 
(see Joreskog fi Sorbom, 1985). These parameters represented 
error covariances between item variables, both within (PSC4, 
PS(:2) and across (PPC4, PSCT; PCCl , P(;S3) suhscales. Such 
findings, as noted earlier, are often encountered with tiodels 
of psychological phenomena, but are particularly evident when 
the model represents items (i.e., observed variables) and 
snhscnle factors (i.e., latent variables) from a single 
measuring instrument (see e.g.* Byrne, in press; Uyrne K 
Shavelson, 1987); error covariances in these instances are 

• 



Factorial Validity 
17 

considered substantively plausible since they indicate 
nonrandom error introduced by n pnrliculnr measurement method 
such as item format. 



Insert Table 2 about here 



To determine the statistical and practical significance of 
these error covariances, then, model fitting continued with the 
specification of three alternative models (Models 2-A). In each 
model, the error covariance in question was specified as a 
free, rather than as a fixed parameter. Since a difference in X 
(Ax^) for competing (i.e., nested) models is itself x^ — 
distributed with degrees of freedom equal to the difference in 
degrees of freedom, this indicator is used to judge whether the 
reestimated model resulted in a statistically significant 
improvement in fit. Model 4 ultimately yielded the model of 
best fit (X^ = 117.57, p>.()5:xVdf = 1.24) and also 

95 

demonstrated a significant improvement in fit (^X^^ = 8.96, 

_£<.01 . 

Hovever, given the known sensitivity of the statistic 
discussed earlier, some researchers have preferred to look at 
differences between (a) the absolute magnitude of estimates 
(Werts et al., 1976), (b) the magnitude of estimates expressed 
as X^ /df ratios (see e.g.. Marsh ft llocevar, 1985), or (c) the 

■ 

ERIC 



Factorinl 



VnliHity 
18 



X*/(lf rntios of nested nodels, as a more roalisttc ind«x of 
model Improvoirent (see e.R., Harsh, lOS"), 1937b). An 
ex«imi nation of differences between the ^ /df ratios in the 
prosont data showed values of .11, .12 and .OS (Models 2-4, 
respectively), suqgestins that the impact of the post hoc 
parameters on the specified model was fairly trivial. This 
notion was supported by three additional pieces of evidence. 
First, the error covariance estimates, while statistically 
slp,nificant (T-values > 2.00), were of relatively minor 
i:ia«nitude (mean 0 = .06). Second, visual inspection of the 
factor loadings and factor covarinnces in Models 1 and 4 
revealed little fluctuation in their estimated values. Third, 
the factor loadings in Model 1 were highly correlated with 
those in Model 4 (jr « .^5); likewise, for correlations computed 
between the factor varinnce-covar iances (jr = .99). Since the 
addition of the error covariance parair.eters to the model 
altered neither the m-^asurenent parameters (see Bagozzi , 1983), 
nor the structural parameters (see Fornell, 19M3), their impact 
on the model was cloarly trivial. Tliese results thus verified 
the paraTieter sta!>ility of the initially hypothesized model; 
Model 1 was, therefore, considered as baseline for grade 5 in 
all subsequent analyses. 

The hypothesized A-factor model for grade 8, as shown in 
Table 2, represented a good fit to the data ( X^/df = 1.35). 



a 



Fnctoria I Val idity 
10 

Although an oxnmination of the modification indices sugROStod 
possible model-fit improvenent if error terms between two item 
variables were allowed to covary, the fit differential was not 
statistically significant ( ^X^^s 3.33, £>.n5); Model 1, 
therefore, was considered basiline for the Rrade 8 normal 
sample. 

Gifted sample . Modol-f ittinj? res u Its for the Rifced 
differed substantially from those for their normal peers. These 
results are presented in Table 3. I,ot us look first at the fit 
statistics for grade . We can see that the initially 
hypothesized 4-factor model (Model 1) does not represent a 
particularly Rood fit to the data 93 = 160.43). To 
investigate the misfit, model fitting proceeded as before with 
the normal sample. A substantial drop in was found when item 
PPC4 ( Ax^^= 25.57, j)<.Of)l) and item PGS/* (Ax^^= 17.99, £<.0Ol) 
were free to cross-load on the social (PSC) and cognitive (PCC) 
factors, respectively. 



Insert Table 3 about here 



In contrast to the post hoc error covariances rencountered 
with the normal sample, these parameters represented fairly 
major alterations to the initial 4-factor model and bear 
importantly on the factorial validity of the Darter instrument. 



ERIC 



0 



Factorinl Validity 

20 • 

Thf? decision to nccept fiodel 3 as Sascline Cor the j>rado 5 
Rifted was based on throe priiiary cons i derations . First,, tho 
secondary loadings of PPC4 on the PSC factor (A ), and PGS4 
on the PCC factor ( ) were both !iiglily significant (T-values 
a A. 97;* 4.09, respoctivoly ) and of fairly hif>h magnitude ( A = 
.61; .6*>, respectively). Second, the factor 1 oad i nf? correlation 
between Models 1 and 3 was .6R, su.TJjrst inf> that the Flodel 1 
measurement estimates were sonrv/hat unstable; the structural 
parameters, on the other hand, appeared to be very stable (jr « 
.99). Finally, the findinp.s wore consistent with an earlier RFA 
of the data which indicated evidence of the same cross-loading 
pattern (see Byrne R Schneider, 1988). 

A review of the model -fittino re suits for ftrade 8 (see 
Table 3) reveals the secondary factor loadings noted earlier, 
to be comnion to both groups of Rifted students. However, a 
well-fit t inR model for the p,rade 8 subsample was realized only 
when two further restrictions on the hypothesized model (Model 
1) were relaxed; these included one error covariance between 
Itom A and Itom-pair 1 on the perceived co{»nitive competence 
subscale (PCC4, PCCl ; Ax^^= 25.74. ji<.001) and one secondary 
factor loadino (P(;S2 on PS(:;Ax^^== 14.14, j)<.001). 

Followinf> these analyses, Model 5 was considered baseline 
for the j»rade 8 v,ifted. As with the previous subsamples, this 
decision was linked to several factors. First, the secondary 



O pi 



Factorial Val idity 
21 

loadings of PPC4. PGS4 and PGS2 on the PSC, PCC and PSC 
factors, respjectivcly , were statistically significant (T-values 
m 4.74, 4.05, 3,80, respectively); the factor loading estimates 

A 

were also of substantial magnitude ( ^ = .45, ,35, .34, 
respectively). Second, the error covariance estimate, unlike 
those for the normal sample, was highly significant (T-value = 

A 

5.76) and fairly large ( 9 = .43); given the size of this 
estimate, it was considered risky to constrain the parameter to 
zero since this specification could have an important biasing, 
effect on other parameters in the model (Alwin & Jackson, 1980; 
Joreskog, 1983). Third, fluctuation of the factor loading 
estimates, albeit more modest than for grade 5, was evident 
between Models 1 and 5; this instability was verified by a 
correlation of .87 between X parameters in the two models; as 
with the grade 5 findings, the structural parameters were shown 
to be fairly stable (£ » .94). Finally, the cross-loading of 
factors for the grade 8 sample was consistent with findings by 
Byrne and Schneider in the EFA study noted earlier. 
Testing for Invariance 

Tests of invariance involved specifying a model in which 
certain parameters were constrained to be equal across groups 
and then comparing that model with a less restrictive model in 
which these parameters were free to take on any value. As with 
model-fitting, the Ax^ between competing models provided a basis 

a 



Factorial Validity 
22 

for determining the tenability of the hypothesized equality 
constraints; a significant Ax^ indicating noninvariance . .Unlike 
the model-fitting analyses, however, the simultaneous 
estimation of parameters was based on the covariance, rather 
than on -the correlation matrix for each group (see Joreskog & 

7 

Sorbom, 1985). For purposes of the present demonstration, 
invariance- testing procedures are applied to the gifted sample 
only, since it is the more interesting of the two samples in 
terms of model specification; analyses focus on equivalencies 
across grades 5 and 8. We first test for the equality of item 
scaling units (i.e., ,factor loadings; A*s), components of the 
measurement model. Once we have determined which item pairs 
(and/or single items) are invariant, we can then proceed with 
tests for the equality of subscale (i.e., factor) covariances, 
components of the structural model. Finally, we test for the 
equality of subscale and item reliabilities. 

As noted earlier, once baseline models are determined, any 
discrepancies in parameter specifications across groups remain 
so throughout the analyses. In the present application, for 
example, the secondary loading in the A matrix (A ), and the 

2 3 

error covariance in the © matrix (9 ) for grade 8, remained 

8 S 

unconstrained for all tests of invarinnce. A summary of the 
baseline model parameter estimates for the grades 5 and 8 
gifted are summarized in Tables 4 and 5, respectively. 

ER?C 93 



Factorial Vnlidi ty 
23 



Insert Tables 4 and 5 <nbout here 



Rqualitv of item scaling units . Since the initial 
hypothesis of equality of covariance matrices was rejected 
» 209.81, j)<,001), invnrinnce testing proceeded, first, to 
the equivalence of item scaling units. These results are 
summarized in Table 6. 



Insert Table () nbout here 



The simultaneous 4-factor solution for each Rroup yielded a 
reasonable fit to the data (x^ = 232.08). These results 

19 0 

suggest that for both grades, the data were well described by 

8 

the four perceived competence factors. This finding, however, 
does not necessarily imply that the actual factor loadinos are 
the same across grade. Thus, the hypothesis of an invariant 
pattern of loadings was tested by placing equality constraints 
on all lambda parameters (including the two common secondary 
loadings, ^,.^3"^ ^uo • hut excluding ^ . the secondary factor 

ad^j "f* 23 

specific to grade 8), and then comparing this model (Model 2) 
with Model 1 in which only the number of factors was held 
invariant. The difference in was highly significant (Ax^^^= 

38.93, £<.001); thus, the hypothesis of an equivalent pattern 

I 

^4 



test 



» • 

Kactorinl Validity 

24 

of scalint; uni-ts was untenable. 

In order to identify which scaling units were noninvariant, 
and thus delect partial measurement invarianco, it seemed 
prudent to first determine whether or not the two common 
secondary loadings were invariant across grade. As such, 
equality constraints were imposed on^^3and^^2, and the model 
reestimated; this hypothesis was found tenable (^X^^. 5.10, 
£>.05). Tests of invariance proceeded next to (a) test each 
congeneric set of scaling units (i.e., parameters specified as 
loading on tbe same factor) and then, given findings of 
noninvariance, to (b) examine the equality of each item scaling 
unit individually. For oxamplo, in testing for the equality of 
all scaling units measuring percoivod general self (PCS), A , A i 

21 31 

„i , as well as le^a^'^nfl ^were held invariant across groups. 
Given that this hypothesis was untenable ( Ax 24.66, j)<.001), 
each factor loading (A , A , A ) was tested independently to 

21 3 1 U ' 

determine whether it was invariant across grade; A and A were 

16,3 ■t2 

also held concomitantly invariant. These analyses detected one 
item scaling unit (PGS2; A^^ ) to be nonirvariant across grade. 

In a similar manner, the scaling units of all remaining 
item pairs (or singletons) were tested for invariance across 
grade. As can be seen in Table 6, invariant factor loadings 
were held cumulatively invariant, thus providing an extremely 
powerful test of factorial invariance. In total, only two item 




Factorial Vnl idity 
2") 



scaling units were found to be noncqui va lent one item pair 

nensurinp, perceived Renernl self (PCiS2; \ ) and one single item 

21 

measuring perceived social competence (PSC4; \ ). 

12,3 

Equality of factor covariances . The first step in testing 
for the^ invariance of structural relations among subscales was 
to constrain all f actor covariances to be equal across grade. 
Equality constraints were subsequently imposed, independently, 
on each of the phi parametero. It is important to note that 
partial measurement invarinnce was maintained throughout these 
testing procedures. In other words, the following measurement 
parameters were held invariant while testing for the equality 
of the factor covariances: the two common secondary factor 
loadings (X ,A ), and all factor loadings except A and A 

16,3 h2 21 12^3 

The hypothesis of equivalent factor covariances was found 

9 

tenable ( « 5.12, j)>.05). If, on the other hand, the 

hypothesis had been found untenable, the researcher would want 
to investigate further, the source of this noninvariance . Thus, 
as demonstrated » ch tests of item scaling units, he/she would 
proceed to test, independently, each factor covariance 
parameter in the matrix; model specification, of course, would 
include the partially invariant measurement parameters. 

Koualitv of rol i a t i 1 it ies . Generally speaking, in 
multiple-indicator CFA models, testing for the invariance of 
reliability is neither necor,sary (Joreskog, 1971b), nor of 

^^6 



Factorial Validity 

26 



particular interest when the scales are used merely as CFA 
indicators and not as measures in their own right, ignoring 
reliability (Miithon. personal communication. October. 1987). 
AlthouR:i Joreskog (1971a) demonstrated the steps involved in 
testing for a completely invariant model (i.e.. i n variant A . 
and 0). this procedure is considered an excessively stringent 
test of factorial invariance (Miithon. personal communication. 
January 1987). In fact. Joreskog (1971b) has shown that while 
it is necessary that multiple measures of a latent construct be 
congeneric (i.e.. believed to measure the same construct), they 
need not exhibit invariant variances and error/uniquenesses 
(see also. Alwin & Jackson. 1980). 

When the multiple indicators of a CFA model represent items 
from a single measuring instrument, however, it may be of 
interest to test for the invariance of item reliabilities. For 
example, this procedure was used by Benson (1987) to detect 
evidence of item bias in a scale designed to measure 
self-concept and racial attitudes for samples of white and 
black eighth grade students, and by Munck (1979) to determine 
whether the item reliability of items comprising two 
attitudinal measures were equivalent across different nations. 
In contract to the conceptual definition of item bias generally 
associated v.:th cognitive instruments (i.e.. individuals of 
equal ability have unequal probabiMiy of success), item bias 



1 



Fnc tori al 



Validity 
27 



related to affective instruments reflects on its validity, and 
hence» on the question of whether items generate the same 
meaning across groups; evidence of such item bias is a clear 
indication ti:<Tt the scores are differentially valid (Green, 
1075) • • 

In the present example, the invariance of factor variances 
was tested first, in order to establish the viability of 
imposing equality constraints on the A and 6 for each item or 
whether, in light of noneqiiivalent factor variances, invariance 
testing should be based on the ratio of true and error 
variances (see Cole & Maxwell. 1985; Rock et al . . 1978). The 
hypothesis of equivalent factor variances was found tenable 
( iix^^« 5,20, £>,05; see Footnote 10). As such, the reliability 
of each item pair (or singleton) was tested for invariance 
acrosb grade by imposing equality constraints on the respective 
^ and ^ parameters; as with previous tests of item scaling units, 
equally reliable items were held cumulatively invariant 
throughout the testing sequence. These results are summarized 
in Table 7. 



Insert Table 7 about here 



Tests of invariance proceeded, first, by testing for the 
equivalency of each suhscale; only the Perceived Cognitive 



Factorinl Validity 
28 



Competence subscalo (PCC) was found to be equivalent across 
Rrade (Ax^ - 8.49. £>.05). Subsequently, the reliability. 

' 10 

equivalency of each item pair (or singleton) was tested. Had 
tests of invariance revealed the factor variances to be 
nonoquivalent, on the other hand, it would have necessary to 
test for item reliability by examining the ratio of true and 
error score variances ( ) • (for an explanation of this 
procedure, see Munck, 1979; Werts et al., 1976). 

Conclusion 

While the use of LISRRI, CFA procedures is becominR more 
prevalent in construct validity research in general, relatively 
few studies have applied this approach to the validation of 
single measuring instruments, in particular. However, of the 
studies that have used the procedure for testing the factorial 
validity and invariance of a single instrument, most share four 
methodological weaknesses; these relate to the failure: (a) to 
determine an adequately specified baseline model, (b) to test 
for partial measurement invariance, (c) to tost for the 
invariance of structural parameters, given partially invariant 
item scaling units, and (d) to test for the equivalence of 
factor variances prior to testing for the invariance of test 
item reliabilities. 

Tlie present paper addressed these limitations in an 
application to data comprising self-report responses to the 



Factorial Vnl idi ty 



llartcr (19S2) Perceived Competerrce Scale for Children by grades 
5 and 8 normal and gifted children. Specifically, the paper 
demonstrated the steps involved in (a) the conduct of 
sensitivity analyses to determine a statistically best fitting, 
yet substantively most meaninp,ful baseline model, (b) testing 
for partial measurejnent invarianco, (c) testing for tbe 
invariance of factor variances and covariances, given partial 
measurement invariance, and (d) testing for the invariance of 
test item and subscale reliabilities. These procedures, 
historically, have received scant attention in the literature. 
Tt is hoped that the present illustration will be helpful in 
providing guidelines to future LTSREL CFA research bearing on 
the construct validity of an assessment instrument. 





Factorial Val idit y 

30 

ilcf crences 

Alwin, D.F. & Jackson, D.J. (1981). Applications of 
simultaneous, factor analysis to issues of factorial 
invariance.- In D.D. Jackion & E.P. Borgatta (Kds.), Factor 
analysis and measurement in sociological research: A 
multidimensional perspective (pp. 249-280). Beverly HillSt 
CA: Sage. 

Alwin, D.F. & Jackson, D.G. (1980). Measurement models for 

response errors in surveys: Issues and applications. In K.F. 
Schuessler (Ed.), Sociological llothodoloRv (pp. 68-119). San 
Francisco : Jossey-Bass . 

Bachman, L.F. & Palmer, A.S. (1981). The construct validation 
of the FSI Oral Interview. Language Learning . 31 , 67-86. 

Bagozzi, R.P. (1983). Issues in the application of covariance 
structure analysis: A further comment. Journal of Consumer 
Research , 9, 449-450. 

Benson, J. (1987). Detecting item bias in affective scales. 
Educational and Psychological Measurement , 4^7, 55-67. 

Bentler, P.M. (1978). The interdependence of theory, metho- 
dology, and empirical data: Causal modeling as an approach to 
construct validation. In D.B. Kandol (Ed.), Longitudinal 
research on drug use: Empirical findings and methodological 
issues (pp. 267-302). New York: Wiley. 

Bentler, P.M. & Bonett, D.G. (1980). Significance tests and 
goodness-of-f i t in the analysis of covariance structures. 
Psychological Bullet in. 83 , 588-606. 



a 



Fnctor i al Va 1 i d i t y 

31 

lUownet M.W. (1^)82). Covarlanro sfriictums. Tn D.M. Hawkins 

(K(l.)t Tonics in appliod mu I tivarialo analysis (pp. 72-141) 
Byrno, U.M. (in press). Moasurin?? adolescent solf-concep't : 

Factorial validity and equivalency of the SDQ III across 

}>ender. Mul t i var iate Beha vi ora 1 Research . 
liyrne, 'B.M. ft Schneider, B.I!. (108S). Perceived Competence 

Scale for Children: To.stinp, for factorial validity and 

invariance across age and ability. Applied Measurement in 

Education . l^, 171-lvS7. 
Byrne, B.M. ft Shavelson, R.J. (1986). On the structure of 

adolescent sel f -concept . Journal of Educational PsyclioloRY t 

28, 474-481. 

Byrne, B.M. * Shavclson, R.J. (1987). Adolescent sel f -concept : 
Testing the assumption of equivalent structure across 
gender. American Educational Research Journal , 24 , 365-385. 

Byrne, B.M., Shavelson, R.J., R Muthen, B. (in press). Testing 
for the equivalence of factor covariance and mean 
structures: The issue of partial measurement invariance. 
Psychological Bulletin . 

Carmines, E.G. & Hclver, J. P. (1981). Analyzing models with 
unobserved variables: Analysis of covariance structures. In 
G.W. Bohrnstedt S E.F. Uor^atta (Eds.), Socia 1 measurement : 
Current i ssues (pp. 65-115). Beverly Hills, CA: Sage. 



Factorial Validity 

32 

Cole, D.A. & Maxwell, S.E. (1985). Multitrait-multimethod 
comparisons across populations: A confirmatory factor 
analytic approach. Multivariate Behavioral Research . 20 , 
389-417. 

Corcoran, M. (1980). Sex differences in measurement error in 
status attainment models. SocioloRical Methods & Research , 
9, 199-217. 

Flamer, S. (1983). Assessment of the mult itrai t-multimethod 
matrix validity of Likert scales via confirmatory factor 
analysis. Multivariate Behavioral Research t 18 , 275-308. 

Fornell, C. (1983). Issues in the application of covariance 

structure analysis: A comment. Journal of Consumer Research , 
9. 4A3-448. 

Forsythe, G.B., McGashie, W.C., Ft Friedman, CP. (1986). 

Construct validity of medical clinical competence measures: 
A mult i trait-mult imethod matrix study usinp, confirnatory 
factor analysis. American F.ducational Research Journal . 23 , 
315-336. 

Green, D.R. (1975). What does it mean to say a test is biased? 

Education and Urban Society . 8,, 33-52 . 
Harter, S. (1982). The Perceived Competence Scale for Children. 

Child Development . 53 . 87-97. 
Hoelter, J.K. (1983). The analysis of covariance structures: 

Goodness-of-f i t indices. Sociological Method s . 1 1 . 325-344. 



ERIC 



Factorinl Validity 

33 

Iliibn. (;.J. ^- Harlow, L.I,. (10S7). Robust structural equation 
models: Triplications for developmental psychology. Chi Id 
Developnent ; 5P> , l/»7-U>6. 

Iluba, C.J., WinRard, J. A., Bentler, WW. (1981). A comparison 
of two latent variable causal nodcls for adolescent drur, 
use. Journal of Personality and Social PsychologV t AO , 
180-193. 

Joresko^f K.G. (1969). A general approach to confirmatory 
maximum likelihood factor analysis. Psychometr ika , 3A , 
1R3-202. 

Joreskogt K.G. (1971a). Simultaneous factor analysis in several 

populations. Psychometr ika , 36 , 409-426. 
Joreskog, K.G. (1971b). Statistical analysis of sets of 

congeneric tests. Psychometrika . 36 . 109-133. 
.loreskog, K.G. (1982). Analysis of covariance structures. In C. 

Fornell (Ed.), A second generation of multivariate analysis 

Vol 1: Methods (pp. 200-242). New York: Prae<»er. 
Joreskog. K.G. (1983). UK LISnDL Workshop, University of 

Kdinburgh, Scotland. 
Joreskog. K.G. & Sorbom. I). ( 1985). LISREL VI; Analysis of 

linear structural relationships by the method of maximum 

likelihood . Moorosvi lie. IN : Scient i f ic Software . 
Lomax, R.G. (1982). A puide to LISlIEL-type structural equation 

modeling. Behavior Research Metliods i\ Instrumentation . 14 . 

1-8. 



Factorial Validity 

34 

UnR, J.S. (1983). Confirmatory factor annlvsis . Beverly Hills, 
CA: Sage. 

MacCnllum, R. (1986). Specification searches in covariance 
structure modeling. Psychological Bulletin . 100 . 107-120. 

Mare, R.D. S Mason, W.M. (1980). Children's reports of parental 
socioeconomic status: A multiple group measurement model. 
Sociological Methods Research . 9,, 178-198. 

Marsh, W AI . (1985). The structure of masculinity/femininity: An 
application of confirmatory factor analysis to h ip,her-order 
factor structures and factorial invariance. Multivariate 
Behavioral Research . 20, 427-449. 

Marsh, II. W. (1987a). The hierarchical structure of self-concept 
and the application of hierarchical confirmatory factor 
analysis* Journal of Rducatio.nal Measurement . 24 . 17-39. 

Marsh, II. W. (1987b). Masculinity, femininity and androgyny: 
Their relations with multiple dimensions of se I f -concept . 
Multivariate Behavioral Research , 22 . 91-118. 

Marsh, H.W., Barnes, J., Cairns, L., A Tidman, M (1984). 

Self-description Questionnaire: Ape and sox effects in the 
structure and level of self-concept for preadolescent 
children. Journal of Educat iona 1 Psychol on v , 76, 940-956. 

Marsh, II. W. R llocevar, D. (1084). The factorial invariance of 
student evaluations of college teaching. American 
Educational Research Journal. 21 . 34 1 -366 . 



'^5 



Factor in 1 Vnl id i ty 

35 

Marsht II.'.?. R iloccvar, I). (iyH5). Appl ic*ntion of conf i rmntory 
factor analysis to the study of sol f-conccpt : First- and 
higher order factor models and their invariance across 
groups. Psychol optical bulletin , 97 , 562-582. 

Harsh, II. W. « O'Neill, K. (1955A). Self Description Question- 
naire TTI: The construct validity of multidimensional 
self-concept rntinRs by late adolescents. Journal of 
Educational Measurement . 21 , 153-174. 

Mnrsh, H.W., Smith, I.D., R l^arnes, J. (1985). Multidimensional 
self-concepts: Helalions with sex and academic achiovenent. 
Journal of Educational Psycholonv , 77 . 581-596.3-187. 

Munck, T.H.E. (1979). Model bui Id inn in comparative education ! 
Anplications of the LTSRFL method to cross-national survey 
data . Stockholm: Alnqvist l\ V/iksell International. 

Muth^n, B. X Chr istof f ersson , A. (1981). Simultaneous factor 
analysis of dichotonious variables in several groups. 
Psvchometrika . 46 . 407-419. 

Muthon, B. & Kaplan, D. (1985). A comparison of methodologies 
for the factor analysis oF non-nornal Likert variables. 
British Journal of I'athemnl ical and Statistical Psycholooy . 
38, 171-189. 

•Jowcomb, fl.l)., Iluba, G.J., fi !'»cntlcr, P.il. (1986). Determinants 
of sexual and clatino, behaviors amonf> adolescents. Journal of 
Personality and Socini Psychology , 50 . 428-438. 



■ 



Fnctorinl Validity 

36 

Reynolds. C,R. fi Harding. R.R. (1983). Outcome in two lnrr,o 
sample studies of factorial similarity under six methods of 
comparison. Educational and Psycholonica 1 Measurement , 43^. 
723-728. 

Rock, I). A.. Worts, C.E., & Flaugher, R.L. (1978). The use of 
analysis of covariance structures for comparing the 
psychometric properties of multiple variables across 
populations. Multivariate Behavioral Research , 13, 403-418. 

Schneider, B.ll., ClegR, M.R., Byrne, B.M,, Ledingham, J.E., & 
Crombie, G. (in press). Social relations of gifted children 
as a function of age and school program. Journal of 
Rducat ional Psychology . 

Sobel. M.E. 8 Bohrnstedt, G.W. (1985). Use of null models in 
evaluating the fit of covariance structure models. In N.B. 
Tuma (Ed.), Sociological methodology , (pp. 152-178). San 
Franci SCO : Jossey-Bass . 

Tanaka, J.S. (1987). "How big is big enough?": Sample size and 
goodness of fit in structural equation models with latent 
variables. Child Development . 58 , 134-146. 

Tanaka, J.S. fi Huba, H.J. (1984). Confirmatory hierarchical 
factor analyses of psychological distress measures. Journa 1 
of Personality and Social Psychology , 46, 621-635. 

Tanaka, J.S. 5 Iluba, G.J. (1985). A fit index for covariance 
structure models under arbitrary CLS estimation, British 
Journal of Mathematical and Statistical Psychology , 38 , 
197-201. 



Factor in I Va 1 id it y 

37 

Tucker, L.R. Lewis, C. (1973). A rolinbility coefficient for 
maximum likelihood factor analysis. Psychomet rika , 38 , 1-10. 

Uatkins, I). R llattie, .1. (19HI). An i nvesti Ration of the 

construct validity of throe recently developed personality 
instruments: An application of confirmatory multinethod 
factor analysis. Australian Journal of Psvcholof^v . 33 , 
277-28A. 

Worts, C.E., Rock, .A., Linn, R.L., R JoreskoR, K.G. (1976). 
Comparison of correlations, variances, covariances, and 
repression weights with or without measurement error. 
Psychological Bulletin , 83 , 1007-1013. 

Wolfle, L.n. (1981, April). Causal models with unmeasured 

variables; An introduction to LISRHL . Paper presented at the 
\merican Educational Research Association Annual Meeting, 
Los Angeles. 

V/olfle, L.ri. R Robertshaw, D. (1983). Racial differences in 

measurement error in educational achievement models. Journal 
of Educational Measurement, 20, 39-A9. 



Factor! nl Val idi ty 
38 

Footnotes 

1. If tosts of factor means are of interest, the measurement 
model would also include the regression intercept (nu,v )• ^ 
vector of constant intercept terms. In the basic CFA model, 
however, variable means are not of interest since they are 
neither structured or explained by the constructs (Bentler, 
107R). 

2. For the same reason as noted in Footnote 1, the gamma (r)f a 
vector of mean estimates, is not included in the structural 
mode 1 • 

3. Secondary loadings arc measurement loadings on more than one 
factor . 

The absolute X^/df ratio value that represents a reasonable 
fit to the data remains a controversial i«;sne. For example, 
Muthen (personal communication, October, 1987) contends that 
a X /df ratio >1.S0 indicates a malfitting model for data 
that are normed to a sample size of lOOf). On the other hand, 
Carmines and Mclver (1981) argue that an acceptable X^/df 
ratio can range as high a., 3.00. Taking a midpoint between 
these two extremes, it seems likely that, with sample sizes 
loss than 1000, a coefficient >2.00 is a fairly good 
indication of model misfit. 



'^9 



r.ict ori n I Val id it y 
30 

This post hoc fittinj; procodiiro hns boon rofcr'^ed to ns 
tests for "substantive invarinnco" (Tnnnk.^ u Hiiha, IHMA) and 
as "sonsitivity aunlysos" (Hyrno ot nl., in press). 
Mean skrwncss and kurtosis values were ns follows: nor'inl 
(ftrade 'i, SK = -.A7, '^U = -,70; p^rndo 8, SIC = -.38, KU = 

niftoci (orado *>, SK ^ -.3M, KU = -,50; r>rnde 8, = 
../♦6, KU = .01). 

The reader is advised that if start values wore included in 
the initial input, these will likely need to he increased in 
order to make them conpatihlo with covariancc, rather than 
correlation values. 

Since x^^"^' i^-*' correspondi na dep^rees of freeriom are 
additive, the sun oCx^'s (see Table 6) reflects how well the 
undorlyins factor s.LrucLuro fits the data acrosr- groups. 
Tliis model was compared with one in which all items known to 
be invariant were constrained equal across parade (Model 
12, see Table 6) . 
. Although the PCC subscalo, as a whole, was found to be 

invariant, tests of individual iton parameters revealed the 
first item pair (P(lCl) to ho non i n va r ia n t ; this illustrates 
the possibility of maskinf> information when analyses are 
conducted at tlie more macroscopic subscale level. 



■ 



40 



Factorial Validity 
AO 



T*ble 1 

Pattern of LISREL Parameters for Model Fitting 



PGSl 
PGS2 
PGS3 
PGS4 
PCCl 
PCC2 
PCC3 
PCC4 
PSCl 
PSC2 
PSC3 
PSC4 
PPCl 
PPC2 
PPC3 
j^PC4 

PCS 
PCC 
PSC 
PPC 



^1 




^3 


^4 


Ti 


0 


0 


0 


^21 


0 


0 


0 


Si 


0 


0 


0 


^41 


0 


0 


0 


0 


!• 


0 


0 


0 


^2 


0 


0 


0 




0 


0 


0 


^2 


0 


0 


0 


0 


I« 


0 


0 


0 


X 

10,3 


0 


0 


0 


X 

11,3 


0 


0 


0 


12,3 


0 


0 


0 


0 


I* 


0 


0 


0 




0 


0 


0 


^5,4 


0 


0 


0 




*11 








♦21 


*22 






♦31 


*32 


♦33 






*42 


♦43 


- 



41 



Tabu 1 cont'd 



Factorial Validity 



pcsT 




«ii 


















PGS2 




0 


«22 
















PGS3 




0 


0 


-Si 




• 










PGS4 




0 


0 


0 














PCCl 




0 


0 


0 


0 


Ss 










PCC2 




0 


0 


0 


0 


0 


S6 








PCC3 




0 


0 


0 . 


0 


0 


0 


S7 






PCC4 
PSCl 




0 
0 


0 
0 


0 
0 


0 
0 


0 
0 


0 
0 


0 

u 


Se 

0 


S9 


PSC2 




0 


0 


0 


0 


0 


0 


0 


0 


0 


PfC3 




0 


0 


0 


0 


0 


0 


0 


0 


0 


PPC4 




0 


0 


0 


0 


0 


0 


0 


0 


0 


PPCl 




0 


0 


0 


0 


0 


0 


0 


0 


0 


PPC2 




0 


0 


0 


0 


0 


0 


0 


0 


0 


PPC3 




0 


0 


0 


0 


0 


0 


0 


0 


0 


PPC4 




0 


0 


0 


0 


0 


0 


0 


0 


0 



10,10 
0 5 



0 
0 
0 
0 
0 



11,11 
0 6 



0 
0 
0 
0 



12,12 

° «13,13 

° ° *1A,14. 

° ° ° «15,15 
0 0 0 0 

16 



'Fixed parameter 

X * observed item measures for the Perceived Competence Scale for Children 
(PCSC); ~ ^4 perceived competence subscales (i.e. factors) of he PCSC 

( ■ perceived general self; perceived cognitive competence; ■ perceived 
social competence; ■ perceived physical competence); A^"= factor loading matrix; 
t " factor variance - covariance matrix; 6^^ error varian ^ - covariance matrix. 
PGS1-GS3 " paired items #4/b, 12/16, 20/24 measuring perceived general self 
(PCS); PGS4 • item #28 measuring PCS; PCC1-PCC3 • paired items #1/5, 9/13. 17/21 
measuring perceived cognitive competence (PCC* PCC4 * item #25 measuring PCC; 
PSC1-PSC3 « paired items #2/6, 10/14, 18/22 measuring perceived social 
competence (PSC); PSC4 « item #26 measuring PSC; PPC1-PPC3 ■ paired items #3/7, 
11/15, 19/23 measuring perceived physical competence (PPC); PPC4 " item #27 



Table 2 

Stcpt in Model Fitting for the Normal Sample 
Competing Models X df 



Factorial Validity 

42 



xVdf 



Grade 5 



1 Basic Arfactor model 152.26 98 .00 

2 Model 1 with correlated error 139.45 97 .00 

between PPC4 and PSC3 

3 Model 2 with correlated error 126.53 96 .02 

between PSC4 and PSC2 

4 Model 3 with correlated error 11" 57 95 .06 

between PCCl and PGS3 



12.81*** 1 



12.92*** 1 



8.96 



1.55 
1.44 

1.32 

1.24 



Grade 8 



1 Basic 4-factor model' 132.13 98 .01 

2 Model 1 with correlated error 120.55 97 .05 

between PGS4 and PGS3 



3.33 



1.35 
1.24 



p < .01 " p < .001 

'Final model conbidered as baseline 

PPC4 ■ Item #27 measuring perceived physical competence; PSC3 ■ Paired items #18 
and #22 measuring perceived social competence; PSC4 ■ item #26 measuring 
perceived ^ccial competence; PSC2 ■ Paired items #10 and #14 measuring perceived 
social competence; PCCl * Paired items #1 and #5 measuring perceived cognitive 
competence; PGS3 " Paired items #20 and #24 measuring perceived general self; 
PGS4 ■ item #28 measuring perceived general self. 



ERIC 



43 



Factorial Valldlcy 

A3 



Table 3 

Stepa in Model Fitting for <">ifted Sample 
Coapering Hodelt 



df 



1 Basic 4-factor aodel 

2 Hodel 1. with PPC4 loading 

on PSC 

3 Hodel 2 with PCS4 loading 

on PCC* 



160.43 
134.86 

116.87 



98 
97 

96 



1 Basic 4-factor model 197.77 98 

2 Model 1 with PPC4 loading 175. i6 97 

on PSC 

3 Model 2 with correlated 149.42 96 

error between PCC4 and PCCl 

4 Model 3 with PCS4 loading 129.35 95 

on PCC 

5 Model 4 with PCS2 loading 115.21 94 

on PSC* 



Ax' 



Adf 



Grade 5 

.00 

.00 25.57*** 1 



.07 



17.99*** 1 



Grade 8 
.00 

.00 22.61*** 1 



.00 



.01 



.07 



25.74*** 1 



20.07*** 1 



14.14*** 1 



X^df 



1.64 
1.39 

1.22 



2.20 
1.81 

1.56 

1.36 

1.23 



***p < .001 

Vinal model considered as baseline 

PSC - perceived social competence factor; PCC « perceived cognitive competence 
factor; PPC4 " item #27 measuring perceived physical competence; PGS4 " item #28 
measuring p ceived general self; PCC4 « item #25 measuring perceived cognitive 
competence; PCCl - Paired items #1 and #5 measuring perceived cognitive 
competence; PCS2 ■ Paired items #12 and #16 measuring perceived general self. 



44 



Factorial Validity 



Table 4 



Baseline Hodel Parameter 


Estimates for 


Grade 


5 Gifted* 




Measured 




Subscale 


Factors 




Item Variables 


PCS 


PCC 


PSC 


PPC 


brror / unimjcnc»» 


PCSl 


.72 


0 


0 


0 


Aft 


PCS 2 


.85 


0 


0 


0 




PCS 3 


.83 


0 


0 


0 




PGS4 


.22 


i46 


0 


0 


• oz 


PCCl 


0 


.72 


0 


0 


AO 


PCC2 


0 


.69 


0 


0 


• 


PCC3 


0 


.69 


0 


0 


SI 


PCC4 


0 


.73 


0 


0 


A7 


PSCl 


0 


0 


.78 


0 




PSC2 


0 


0 


.66 


0 


SA 
• ^o 


PSC3 


0 


0 


.76 


0 


AO 


PSC4 


0 


0 


.61 


0 


• Ox 


PPCl 


0 


0 


0 


.76 


Al 


PPC2 


0 


0 


0 


.79 


10 
• So 


PPC3 


0 


0 


0 


.82 


.33 


PPC4 


0 


0 


.47 


.30 






Subscale (Factor) Correlations 




PCS 












PCC 


.56 










PSC 


.61 


.42 








PPC 


.31 


.33 


.43 







Factor loadings and factor correlations are presented in standardized form to 
facilitate interpretation. 



^Item variables 1-3 represent the first six items of each subscale, paired 

consecutively; item variable 4 represents t*.e seventh item of each subscale. 
PCS • perceived general self; PCC ■ perceived cognitive competence; PSC • 
perceived social competence; PPC ■ perceived physical competence. 



45 



Factorial Validity 

45 

Table S 

Baaeline Model Pafameter Eatimates for Grade 8 Gifted* 
Meaiured Subscale Factors 

Item Variables'* PCS PCC PSC PPC Error /Uniqueness 



PGSl 


.88 


0 


0 


U 




PGS2 


.63 


0 


. 28 


A 

Q 


17 
• J' 


PGS3 


.91 


0 


0 


A 

Q 


lit 


PGS4 


CO 

• 58 


; 3U 


0 


A 

Q 


• HO 


PCCl 


0 


QQ 
. 00 


U 


A 

Q 




PCC2 


0 


.66 


0 


0 


.57 


PCC 3 


0 


.65 


0 


0 


.58 


PCC4 


0 


.89 


0 


0 


.21 


PSCl 


0 


0 


.82 


0 


.33 


PSC 2 


0 


0 


.83 


0 


.32 


PSC 3 


0 


0 


.87 


0 


.24 


PSC4 


0 


0 


.55 


0 


.70 


PPCl 


0 


0 


0 


.83 


.31 


PPC2 


0 


0 


0 


.89 


.22 


PPC 3 


0 


0 


0 


.37 


.22 


PPC4 


0 


0 


.37 


.55 


.38 



Subscale (Factor) Correlations 

PCS 

.33 

PSC .43 .16 

PPC .40 .15 .45 



^Factor loadings and factor correlations are presented in standardized form to 

fac ilitate interpretat ion. 
**Item variables 1-3 represent the first six items of each subscale, paired 

consecutively; item variable 4 represents the seventh •tern of each subscale. 
PCS • perceived general self; PCC " perceived cognitive competence; PSC " 
perceived social competence; PPC * perceived physical competence. 

er|c ^6 



Factorial Validity 

^6 



Table 6 



T#stB for Invar lAncfi of Item 


Seal ing 


Unita Acroaa 


Crad# fnr 


thA Ciftffd 

kllV VAAVC\i 


X 


CoBoetinft Model 


A 


df 


Av2 


oat 


1 Four Mrceived 


232.08 


190 






1.22 


factors invariant 












7 Uf%Am\ 1 Ml rh all fartnr 


271 01 




JO • ^ J ^ ^ " 




1 33 

A . 


loadinss invariant' 

9^#^a^a A ai jfc 0 a ▼ a * «» 












3 Modal 1' with 2 comion 


237. 18 


192 


5 10 

J . AV 


2 


1.24 


saconoary loaa in^s 












invar iant 

A ai V ^a a a ^a ai ^ 












4 Model 3 with all PCS 


256. 74 


195 


24.66^* 


5 


1.32 


factor loadinas invariant 

A w w a ^%^^v^a A ai jfc 9 a as v ^a a a ^a as ^ 












S Model 3 vith PGS2 


254.33 


193 


22 2S*** 


3 


1.32 


• > 
Invariant 












6 Model 3 iiith PGS3 


239.47 


193 


7 39 


3 


1.24 

A . 


invar iant 












7 Model 3 with PGS3 


240. 37 


194 




A 
H 


1 24 


PGS4 invariant 












8 Model 7 with all PCC 


244*35 


197 


12.27 


7 


1.24 


factor loadinaa invariant 

^ ^a W ^ a ^a^a A as ca a as v ^a a » ^a as ^ 












9 Model 8 with all PSC 


251.37 


200 


IQ 2Q* 


10 

Aw 


1 28 

A . AO 


factor loadinffa '{nMAr'fan^ 












10 Model 8 with PSC 2 


245. 20 


198 


13 12 

A ^ . A A 


8 


1 24 


invariant 












U Model 8 with PSC2, 


245.45 


199 


13.37 


9 


1.23 


PSC3 invariant 












12 Model 11 vith all PPS 


248.69 


202 


16.61 


12 


1.23 


factor loadings invariant 













*p < .05 < .001 

'including the 2 common secondary factor loadings 
b 

The first item-pair loading for each factor was fixed to 1.0 for purposes of 
statistical identification. PGS • perceived general self; PCC « perceived 
cognitive competence; PSC * perceived social competence; PPC * perceived 
physical competence. 



ERJC 



47 



Factorial Validity 

Table 7 

Tests for Invariance of Subscale and Item Reliabilities Across Grade for the 
Gifted 

v2 



Co«peting Model 



df Ax 



.2 



Adf 



xVdf 



1. TWO coBMon secondary factor 237.18 
loadings invariant 

^16.3 ^2 
Subscales 

2. PCS subscale Hodel 1 with 269.84 

Hi - \l *11 - *44 
invar iant 

3. PCC subscale Hodel 1 with 245.67 

- and «55 " «88 
invariant 

4. PSC subscale Hodel 1 with 272.83 

*93 " ^12, 3 '99 ' *12,12 
invariant 

5. PPG subscale Hodel 1 with 269.29 

^3.4 - ^6.4 *13.13 - *16,16 
invariant 



192 



1.24 



199 



199 



206 



206 



32.66*** 7 



8.49 



35.65** 14 



32.11** 14 



1.36 



1.23 



1.32 



1.31 



Items 

6. Model 1 with 

h\ «n 

invariant 

7. Model 1 with 

Hi •"'^ *22 
invariant 

8. Model 1 with 

X and 5 
31 33 

Invariant 



241.23 



254.76 



246.48 



193* 



194 



194 



4.05* 



17.58*** 2 



9.30** 



1.25 



1.31 



1.27 



48 



Table 7 cont'd 



Factorial Validity 

^8 



Conpeting Model 



df 



adf xVdf 



9. Model 1 with 
\l 

invariant 

10. Model 9 with 
>5jand 
invariant 

11. Model 9 with 

^62 *66 
invariant 

12. Model 11 with 

^72 *77 
invariant 

13. Model 12 with 

^82 *88 
invariant 

14. Model 13 with 
X,3 and 6,, 

invariant 

15. Model 14 with 

*10,10 

invariant 

16. Model 13 with 

Hi. 3 *11.11 
invariant 

17. Model 13 with 

^2.3 *12.12 
invariant 

18. Model 13 with 

^3.4 *13.13 
invariant 

19. Model 18 with 

^4,4 *14,14 
invariant 



242.88 



243.36 



244.82 



243.08 



249.19 



249.19 



234.92 



263. 13 



266. 52 



238.14 



266.23 



194 



193* 



196 



198 



200 



20 r 



203 



203 



203 



3.70 



8.38* 



7.64 



7.90 



12.01 



12.01 



17.74 



204* 20.96 



11 



27.97** 13 



29.34** 13 



206 



29.07* 



12 



14 



1.25 



1.26 



1.25 



1.24 



1.25 



1.24 



1.26 



1.29 



1.30 



1.27 



1.29 



ERJC 



49 



Table 7 cont'd 



Factoilai Validity 

-19 



Competing Model df Ax^ Adf x^/df 

261. OA 206 23.86* lA 1.27 

26A.40 206 27.22* lA 1.28 



*p < .05 **p < .01 ***p < .001 

difference in degrees of freedom equals one due to first loading for each factor 
being fixed tp 1.00. 

PCS = perceived general se ; PCC ^- perceived cognitive competence; PSC = perceived 
social competence; PPC ^ perceived physical competence. 



20. Model 18 with 

^5,4 *15.15 
invariant 

21. Model 12 with 

16,^^ 16,16 
invariant 



50 



